A Framework for Word Spotting In Scanned Urdu Documents by Exploiting the Dot Orientation
نویسندگان
چکیده
Urdu is one of the most widely used languages in the world and there is a need of developing character recognition and word-spotting algorithms, so that Urdu literature can be made easily accessible and searchable to the Urdu reading population. Although there has been a sizeable research for character recognition, very few articles have been published for word-spotting in Urdu language. Unlike English language (with only two alphabets with dots), in Urdu language 17 out of 38 alphabets have dots either above or beneath them. This paper presents a data reduction framework, based on exploiting the dot orientation for word spotting in Urdu scanned documents. After applying the proposed scheme, the number of eligible candidates for the target word is greatly reduced. As demonstrated in the Results and Analysis section, the proposed algorithm has shown promising results with an average data reduction rate of 79.8%. [Muhammad Shafi, Faisal Iqbal, Iftikhar Ahmed Khan, Muhammad Irfan Khattak, Mohammad Saleem, Naeem Khan. A Framework for Word Spotting In Scanned Urdu Documents by Exploiting the Dot Orientation. Life Sci J 2013; 10(7s): 1163-1171]. (ISSN: 1097-8135). http://www.lifesciencesite.com 185
منابع مشابه
Connected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملA Survey on Various Word Spotting Techniques for Content Based Document Image Retrieval
Searching documents for information and retrieval of relevant documents is a basic activity. Various tools are readily available for searching and retrieval from digital documents, but not much robust methods are available for retrieval from historic documents and old manuscripts as they are not digitized but available in scanned formats. Conventional way of retrieval from scanned document imag...
متن کاملPoliteness Orientation in Social Hierarchies in Urdu
The present research is aimed at investigating how the politeness of the speakers of Urdu is influenced by their relative social status in society. The researcher took politeness theory of Brown and Levinson (1978, 1987) as a model. To observe politeness of Urdu speakers, speech act of apology with different strategies was selected. A Discourse Completion Task (DCT) was used as an instrument to...
متن کاملSpotting words in handwritten Arabic documents
The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an inde...
متن کاملWord Spotting in Scanned Tamil Land Documents using K-Nearest Neighbor
word spotting is a technique which can extract the text from input image. Here, we implemented on scanned Tamil land documents. Using Gabor feature, we extract the feature values for the input image. The main goal is recognize the text from the document using K nearest neighbor classifier. The features were calculated and the features were combined. Using these features, we can classify and rec...
متن کامل